Language Editing Dataset of Academic Texts

نویسنده

  • Vidas Daudaravicius
چکیده

We describe the VTeX Language Editing Dataset of Academic Texts (LEDAT), a dataset of text extracts from scientific papers that were edited by professional native English language editors at VTeX. The goal of the LEDAT is to provide a large data resource for the development of language evaluation and grammar error correction systems for the scientific community. We describe the data collection and the compilation process of the LEDAT. The new dataset can be used in many NLP studies and applications where deeper knowledge of the academic language and language editing is required. The dataset can be used also as a knowledge base of English academic language to support many writers of scientific papers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Writers on the Move: Visualizing Composing Processes Involved in Academic Writing

The present research study aimed to explore covert processes of editing and revision which were involved in writing four different academic text genres (i.e. abstract, conclusion, data commentary, and cover letter) in English language. To this end, six EFL learners with Persian as their mother were recruited to participate in this study. All the participants attended an induction session and ea...

متن کامل

From Academic to Journalistic Texts: A Qualitative Analysis of the Evaluative Language of Science

This study examined academic articles and journalistic reports in 5 disciplinary areas to explore how similar contents might attitudinally be realized in two different genres. To this end, 25 research articles and 210 news reports were carefully selected and underwent detailed discourse semantic and grammatical analyses with the purpose of identifying the evaluative linguistic patterns....

متن کامل

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

Teaching Academic Vocabulary Through Reconstruction Editing Task: Does Group Size Matter?

The use of collaborative classroom interactional tasks is on the rise recently since they incorporate the negotiation of meaning and thus they may be regarded as one of the most efficient ways to ease a learner’s focus on form. This study investigated the immediate and long-term effects of reconstruction editing task on the learning of 20 academic vocabulary items through using five reconstruct...

متن کامل

Persian Speakers’ Recognition of English Relative Clauses: The Effects of Enhanced Input vs. Explicit Feedback Types

Despite consensus in focus on form (FOF) instruction over the facilitative role of noticing, controversy has not quelled over ways of directing EFL learners’ attention towards formal features via implicit techniques like input-enhancement or explicit metacognitive feedback and interactive peer-editing on the output they produce. This quasi-experimental study investigated the impact of input enh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014